Exploring the dark foldable proteome by considering hydrophobic amino acids topology
نویسندگان
چکیده
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
منابع مشابه
Performance of 2-Amino Tetraphenyl Porphyrin as Stationary Phase in RP-HPLC of Amino Acids
The search for new stationary phases has been one of the predominant concerns in high performance liquid chromatography (HPLC) in order to achieve better resolutions, longer column lives, and reduce the time of analysis. A chromatographic packing for separation of underivatized amino acids (AAs) were prepared by dynamically coating 2-amino tetraphenyl prophyrin (atpp) on a C-18 reversed-pha...
متن کاملComprehensive Repertoire of Foldable Regions within Whole Genomes
In order to get a comprehensive repertoire of foldable domains within whole proteomes, including orphan domains, we developed a novel procedure, called SEG-HCA. From only the information of a single amino acid sequence, SEG-HCA automatically delineates segments possessing high densities in hydrophobic clusters, as defined by Hydrophobic Cluster Analysis (HCA). These hydrophobic clusters mainly ...
متن کاملPrebiotic protein design supports a halophile origin of foldable proteins
There are significant challenges in forming testable hypotheses regarding abiogenesis (i.e., the origin of life); for example, the original environment on the early Earth during the process of abiogenesis is a matter of debate [although it was significantly different from the current environment (Oparin, 1952; Hazen et al., 2008)]. Furthermore, the process of abiogenesis occurred over a time sc...
متن کاملHydrophobic forces and the length limit of foldable protein domains.
To find the native conformation (fold), proteins sample a subspace that is typically hundreds of orders of magnitude smaller than their full conformational space. Whether such fast folding is intrinsic or the result of natural selection, and what is the longest foldable protein, are open questions. Here, we derive the average conformational degeneracy of a lattice polypeptide chain in water and...
متن کاملSequence-based predictions of membrane-protein topology, homology and insertion
Membrane proteins comprise around 20-30% of a typical proteome and play crucial roles in a wide variety of biochemical pathways. Apart from their general biological significance, membrane proteins are of particular interest to the pharmaceutical industry, being targets for more than half of all available drugs. This thesis focuses on prediction methods for membrane proteins that ultimately rely...
متن کامل